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ABSTRACT 

The  multilevel  fast  multipole  method  (MLFMM) 
is  an  algorithm  that  has  had  great  success  in  reduc¬ 
ing  the  computational  time  required  to  find  the  so¬ 
lution  to  the  Galerkin  boundary  integral  form  of  the 
Helmholtz  equation.  We  present  a  new  formulation  of 
the  MLFMM  using  Fourier  basis  functions  rather  than 
spherical  harmonics  in  order  to  accelerate  and  sim¬ 
plify  the  time-critical  stages  of  the  algorithm.  With 
modifications  to  the  transfer  function  in  the  precom¬ 
putation  stage  of  the  MLFMM,  the  interpolation  and 
anterpolation  algorithms  become  straightforward  ap¬ 
plications  of  FFT  interpolations  only.  Using  spectral 
methods,  constructive  algorithms  are  derived  to  deter¬ 
mine  a  near-optimal  quadrature  for  a  given  level  in  the 
algorithm  and  an  a-priori  estimate  of  the  integration 
error. 


1.  INTRODUCTION 

Since  its  development  in  (Rokhlin,  1990;  Coifman 
et  al.,  1993;  Rokhlin,  1993;  Engheta  et  al.,  1992;  Ra- 
hola,  1996),  the  MLFMM  has  proven  to  be  a  very  ef¬ 
fective  tool  for  solving  acoustic  and  electromagnetic 
problems.  We  consider  only  the  scalar  wave  equation, 
although  any  results  can  be  extended  to  the  vector  case 
as  described  in  (Chew  et  al.,  2001;  Darve,  2000b).  Ap¬ 
plying  the  boundary  integral  method  to  the  Helmholtz 
equation  results  in  a  dense  linear  system  which  can 
be  solved  by  iterative  methods  such  as  GMRES  or 
BCGSTAB.  These  methods  require  a  dense  matrix- 
vector  product  which,  for  a  direct  implementation,  re¬ 
quires  0(N2)  operations  to  compute.  The  MLFMM 
uses  an  approximation  of  the  dense  matrix  to  perform 
the  product  in  0(N  log  N)  time.  Computations  tak¬ 
ing  advantage  of  this  approximation  are  distributed 
through  an  oct-tree  encompassing  the  domain  of  the 
scatterers  to  achieve  the  improved  asymptotic  running 
time. 

There  are  a  number  of  difficulties  in  implementing 
the  MLFMM,  each  of  which  much  be  carefully  consid¬ 
ered  and  optimized  to  achieve  the  improved  complex¬ 


ity.  The  largest  complication  is  the  quadrature  sam¬ 
pling  rate  must  increase  with  the  size  of  the  box  in 
the  oct-tree,  requiring  interpolation  and  anterpolation 
algorithms  over  the  sphere  to  transform  the  data  be¬ 
tween  quadratures.  Local  algorithms  and  interpolant 
matrix  sparsifications  are  fast,  but  incur  large  errors. 
Fast  transforms  are  available  for  spherical  harmonics, 
but  they  are  approximate,  quite  complicated  to  imple¬ 
ment,  and  not  always  stable.  We  present  the  MLFMM 
and  these  choices  in  the  next  section. 

2.  THE  MULTILEVEL  FAST  MULTIPOLE 
METHOD 


The  MLFMM  improves  the  asymptotic  complexity 
of  the  matrix- vector  multiplication 

_ ^  e2K|Xi—  Xj|  _ ^ 

^  H  ix._x.|V>3  :=  J2  (!) 

for  i,j  =  1, . . . ,  N  from  0(N2)  to  0(N log2  N ).  This 
improvement  is  based  on  the  Gegenbauer  series 

iK|r+r0|  °° 

— —  =  t«£(-l)"(2n+  l)GB(r,r0)  (2) 

lr  +  r°l  „= o 

G„(r,r0)  =  h^\n  |r0|)jn(/c  |r|)Pn(r  •  r0) 

where  Jin'1  is  the  spherical  Hankel  function  of  the  first 
kind,  jn  is  the  spherical  Bessel  function  of  the  first 
kind,  and  Pn  is  the  Legendre  polynomial  of  order  n. 
The  series  converges  absolutely  and  uniformly  for  |r|  < 
|r0|  and  has  been  studied  extensively  in  (Carayol  & 
Collino,  2004;  Darve,  2000a). 


Truncating  the  Gegenbauer  series  at  i  and  using 
the  identity 


e*Kr'sP„(?0  -s)  dS( s)  =  47rz”jn(K|r|)Pn(?  -f0) 


where  the  integral  is  over  the  unit  sphere  S'2  =  {s  £ 
R3  :  |s|  =  1},  then 
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where  we  have  defined  the  Gegenbauer  truncation  er¬ 
ror  ££  and  the  “transfer  function” 

t 

Te,r o0)  =  (4) 

n— 0 

Additionally,  it  will  be  useful  to  note  the  Jacobi- Anger 
series 

oo 

elKT  S  =  ^2  in(2n  +  1  )jn(K  |r|)P„(s  •  r).  (5) 

n— 0 

Consider  two  disjoint  clusters  of  points  {x^  |  *  £  A} 
and  {xi  |  i  £  B}  with  radii  ta  >  fB  >  0  and  centers 
c a  and  cb  respectively.  If  Ic^  —  cb|  >  ta,  then  the 
matrix- vector  product  (1)  is  accelerated  by  using  the 
approximation 

lyr  ~  M-AA  M'ab 
~  [M'ba  M bb_ 

where 


MaA  —  l^ij]i,jeA 


M  ’ab  = 


[  eXK~Ti*Tl  ro(s)e«s.rc3.  dS^ 

Js 2 


-  ieApe-B 


ric  — ca,  Tq  —  cA  c Bi  r  cj  —  Cb  x 


The  improved  asymptotic  complexity  of  the  FMM  is 
achieved  by  constructing  a  tree  of  nodes  {cp},  typically 
an  octree,  which  partitions  the  support  of  the  source 
and  field  points  at  the  finest  level.  First,  the  outgo¬ 
ing  plane-wave  expansions  due  to  the  source  points  (or 
basis  functions)  are  accumulated  at  the  finest  level  in 
the  tree.  These  outgoing  expansions  are  then  aggre¬ 
gated  upward  through  the  tree  via  additional  plane- 
wave  expansions.  Incoming  plane- wave  expansions  are 
computed  from  the  outgoing  by  multiplication  of  the 
transfer  function.  These  incoming  plane-waves  are 
then  disaggregated  downward  through  the  tree.  At 
the  finest  level  the  integration  is  performed  and  accu¬ 
mulated  with  the  near  field  contribution  to  determine 
the  field  value  at  the  field  points  (or  testing  functions). 


Therefore,  we  wish  to  exactly  integrate  the  spherical 
harmonics,  Y™.  Below,  we  enumerate  a  number  of 
choices  for  appropriate  spherical  quadratures  that  have 
previously  been  studied. 

1.  The  simplest  choice  are  sample  points  chosen  uni¬ 
formly  in  9  and  4>.  However,  this  choice  does 
not  accurately  integrate  the  spherical  harmonics 
and  requires  approximately  twice  as  many  points 
as  the  Gauss-Legendre  quadrature  below  (Darve, 
2000b). 

2.  The  most  common  choice  of  sample  points  are  uni¬ 
form  points  for  9  and  Gauss-Legendre  points  for  (f>. 
With  N  + 1  uniform  points  in  the  9  direction  and 

Gauss-Legendre  points  in  the  (j)  direction,  all 
Y™,  —n  <  m  <  n,  0  <  n  <  N  are  integrated  ex¬ 
actly  (Darve,  2000b;  Koc  et  al. ,  1999). 

3.  McLaren  in  (McLaren,  1963)  developed  opti¬ 
mal  choices  of  samples  for  general  functions  on 
S2  based  on  equally  spaced  points  and  derived 
from  invariants  of  finite  groups  of  rotations.  He 
also  proposes  a  method  for  constructing  equally 
weighted  integration  formulas  on  sets  of  any  de¬ 
sired  number  of  points  by  taking  the  union  of 
icosahedral  configurations. 

2.2  Interpolation  in  the  MLFMM 

The  quadrature  sampling  rate  depends  depends  on 
the  spectral  content  of  the  translation  operator,  elKs  r. 
This  term’s  spherical  harmonic  coefficients  e™  in  (7) 
decreases  super-exponentially  roughly  after  the  «|r| 
mode,  which  scales  linearly  with  the  box  size  in  the 
tree.  Therefore,  as  we  go  up  the  tree  in  the  aggregation 
step  and  |r|  becomes  larger,  a  larger  quadrature  and 
corresponding  interpolation  algorithm  are  required  to 
resolve  these  higher  modes.  These  modes  must  be  re¬ 
solved  since  they  interact  with  the  modes  in  the  trans¬ 
fer  function,  which  do  not  decay  as  t  increases. 


2.1  Spherical  Quadrature  in  the  MLFMM 

Noting  the  addition  formula 

n 

(2 n  +  l)P„(p  •  q)  =  47T  £  iMy;(q) 

m=—n 

the  transfer  function  (4)  and  Jacobi-Anger  series  can 
be  expressed  as 

t  n 

Tbr0(s)  =  ^  £  WOO  (6) 

n=0  m=—n 
oo  n 

pKr'S  =  E  £  CF7(s)  (7) 

71=0  m——n 


Similarly,  as  we  go  down  the  tree  in  the  disaggre¬ 
gation  step,  |r|  becomes  smaller,  preventing  the  higher 
modes  of  the  incoming  field  approximation  from  con¬ 
tributing  to  the  integral  as  a  consequence  of  Parseval’s 
theorem.  Thus,  as  the  incoming  field  is  disaggregated 
down  the  tree,  a  smaller  quadrature  can  be  used  to 
resolve  it.  This  makes  the  integration  faster  and  is  ac¬ 
tually  required  to  achieve  an  improved  asymptotic  run¬ 
ning  time  at  the  cost  of  requiring  anterpolation  algo¬ 
rithms.  Below,  we  enumerate  a  number  of  choices  for 
interpolation  and  anterpolation  algorithms  that  have 
previously  been  studied. 
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1.  General  interpolation  algorithms  for  non-uniform 
grid  points  like  Lagrange  interpolation  or  B- 


splines  are  fast  and  provide  for  simple  error  analy¬ 
sis.  In  (Koc  et  al. ,  1999)  it  is  shown  that  the  error 
induced  from  Lagrange  interpolation  decreases  ex¬ 
ponentially  as  the  number  of  interpolation  points 
is  increased  for  a  given  function  of  finite  band¬ 
width.  Thus,  there  is  a  trade-off  between  error 
and  speed.  The  introduction  of  error  is  unavoid¬ 
able  with  local  schemes  and  is  dependent  on  the 
values  of  the  function  at  the  time  of  interpolation. 

2.  For  a  set  of  quadrature  points  (6k,4>k),  k  = 
1, . . . ,  K  with  respective  weights  uik  and  corre¬ 
sponding  function  value  fk ,  a  direct  spherical  har¬ 
monic  transform  sends  fk  to  a  new  quadrature 
(9'k, , <//,),  k'  =  1  via  the  linear  transfor¬ 

mation 


Because  the  integrand  is  continuous  and  periodic,  this 
formulation  allows  the  use  of  the  Fourier  basis  func¬ 
tions  {emee*m^}  which  form  an  L 2  orthonormal  set 
over  L2([0,27t]  x  [0,27t]).  Thus,  we  efficiently  use  two 
dimensional  uniform  quadratures,  fast  Fourier  trans¬ 
forms  in  the  interpolation  and  anterpolation  steps,  and 
spectral  arguments  in  the  error  analysis.  The  result  is 
a  fast  and  easily  implemented  version  of  the  MLFMM 
with  strong  control  over  the  maximum  absolute  error, 
which  we  detail  in  the  following  sections. 

3.1  Computing  the  Modified  Transfer  Function 

Select  a  uniform  quadrature  with  points  ( 0i,4>j ) 
defined  by 


fk'=  E  yri.o'k,A'k-)Y,uj^T^k^k)fk 

m,l<K  k 

This  transform  has  nice  properties  analogous  to 
those  of  the  Fourier  transform.  A  direct  compu¬ 
tation  requires  O(KK')  operations  which  would 
result  in  an  0(N2)  FMM.  Fast  spherical  trans¬ 
forms  (FST)  have  been  developed  in  (Driscoll  & 
Healy,  1994;  Healy  et  al.,  2003;  Rokhlin  &  Tygert, 
2006)  and  applied  to  the  FMM  in  (Chowdhury 
&  Jandhyala,  2006).  Using  the  FST  reduces 
the  interpolation  and  anterpolation  procedures 
to  0(Klog2  K),  which  results  in  a  0(Nlog2  N) 
MLFMM.  However,  the  accuracy  and  stability 
of  these  algorithms  are  questionable  as  they  rely 
on  the  nonequispaced  discrete  Fourier  transform 
and/or  the  fast  Legendre  transform. 

3.  Approximations  of  the  direct  spherical  transform 
have  also  been  investigated  in  (Jakob-Chien  & 
Alpert,  1997;  Darve,  2000b).  The  interpolation 
matrix 

Ak'k=  E 

m,l<K  k 

can  be  sparsified  in  a  number  of  ways  to  pro¬ 
vide  an  interpolation/ anterpolation  method  that 
scales  as  O(K)  with  scalable  relative  error. 

3.  FOURIER  BASED  MLFMM 


8i  =  2-7T —  <bj  =  2i 

Ng  N 0 

Noting  that  the  plane  wave  and  modified  transfer  func¬ 
tion  both  have  spherical  symmetry  since 

s (9,  <j>)  =  s(7r  +  9,2n  —  <t>) 

then  the  computational  and  memory  cost  will  be  re¬ 
duced  if  the  quadrature  is  also  constructed  with  spher¬ 
ical  symmetry  since  only  half  of  the  quadrature  points 
will  be  need  computed  and  stored.  Thus,  we  force 
to  be  a  multiple  of  2  and  Ng  to  be  a  multiple  of  4. 
By  taking  advantage  of  all  of  the  symmetries  of  this 
quadrature  the  number  of  modified  transfer  functions 
that  need  to  be  precomputed  is  reduced  from  316  per 
level  to  34  -  saving  a  factor  of  9.3  in  memory  and  cost¬ 
ing  a  negligible  permutation  of  values  in  time. 


A  key  step  to  computing  T)?  is  to  remove  the  fre¬ 
quencies  which  a  given  quadrature  cannot  resolve.  If 
T/'ro  were  simply  sampled,  significant  frequency  alias¬ 
ing  (or  folding)  would  occur  unless  we  used  an  un¬ 
reasonably  large  quadrature.  This  is  due  to  the  slow 
decay  of  the  Fourier  series  of  |sin(</>)|, 


T(rn\  |sin(</>)  | ) 


(-l)m  +  l 

7r(  1  —  m2) 


/It 

1° 


if  an  even 
if  to  odd 


The  Fourier  based  fast  multipole  method  is  based 
on  the  the  identity 

n  (‘‘2'K  /»27T 

/  e*"r-s7^ro(s)  dS(s)  =  /  /  elKr  sT/ro(s)  dOdtcf 

Js2  Jo  Jo 

(8) 

where  s  =  [cos(0)  sin(^),  sin(0)  sin(^),  cos(</>)]  and 
is  the  “modified  transfer  function” 

L,ro 

r/,ro(M)  =  ^,ro( s)  |sin(0)|  (9) 


Since  the  spectrum  of  the  plane-wave  function 

OO 

„iK,r-s  _2/dr|  cos(0s  r)  \  ^  n  j  /,  \„\\ r) 

e  =  e  1  1  v  s,r;  =  y  1  Jn\K  |r|)e  v  •  ' 

TI—  —  00 

decays  very  rapidly  for  n  >  k  |r|,  the  high  frequencies 
in  Tf  will  not  contribute  to  the  final  integral  as  a 
result  of  Parseval’s  theorem.  By  removing  these  fre¬ 
quencies  from  the  modified  transfer  function,  a  smaller 
quadrature  can  be  used  without  significantly  affecting 
the  total  error. 
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Suppose  we  have  chosen  a  quadrature  character¬ 
ized  by  (N^jNq).  Since  T/  is  band-limited  with 
bandwidth  2i  +  1  in  the  ^-direction,  we  only  need 
to  treat  the  0-direction  carefully.  Noting  that  T0ro 
is  bandlimited  in  0  with  bandwidth  2i  +  1,  only  fre¬ 
quencies  \m\  <  N(p/2  +  t  of  |sin(0)|  contribute  to  the 
N^/2  frequencies  of  T/  resolved  by  the  quadrature. 
Therefore,  to  compute  a  low-pass  T£  we  follow  the 
algorithm  sketched  below: 


where  sn>m  =  [cos(0„)sin(0m),sin(0„)sin(0m),cos(0m)] 
and  sn<m)  is  the  low-pass  modified  transfer  func¬ 
tion  described  in  Sec.  3.1, 

f2ir  p  2ir 

=  l  Jo  ^rs(T^(s)  +  r;f(s))  ded<t> 

N<f,  Ng 

-  E  E  w n,m  r(sn,m)  +  E^r( Sn>m)) 

m=l  n=l 


For  each  0*,  0  <  z  <  A0/2 

k  =  0,...,2£. 

20  <—  Fourier  series  coefficient  of  20. 

Sm  <-  ^(ItoI^  JV^/2  +  £■  |sin(0)|). 

T®  -e-  sm  <g)  20  convolution  of  Fourier  series. 

T*  <—  truncate  to  frequencies  |n|  <  N^/2. 

Ts(di,(f>j )  <—  Inverse  transform  of  T®. 

This  algorithm  yields  the  low-pass  modified  trans¬ 
fer  function  at  (0j,0j),  0  <  i  <  Ng/2,  0  <  j  <  N $ 
which  can  be  unwrapped  to  the  remaining  points  by 
using  the  spherical  symmetry 

(0j,0j)  =  (0ATe/2+i, 


Note  that  this  method  can  also  be  simply  ex¬ 
pressed  in  terms  of  Fourier  interpolations  and  anter- 
polations.  It  is  equivalent  to  making  a  Fourier  inter¬ 
polation  of  20  to  (2£  +  1)  +  {2(N(f>/2  +  l)  +  1)  —  1 
points,  multiplying  by  a  low-pass  |sin(0)|  with  fre¬ 
quencies  \m\  <  Nrj>/2  +  £,  and  performing  a  Fourier 
anterpolation  back  to  points. 


where  ETr'r  consists  of  the  low  frequencies  of  e*Krs 
which  are  resolved  by  the  quadrature  and  E^r  consists 
of  the  high  frequencies  not  resolved  by  the  quadra¬ 
ture.  Since  E^rTg'^Q  is  integrated  exactly  by  a  uniform 
quadrature, 


Eg(s)Tlfo(S)  d9dt 


Ne 

-EE 


m—1  n=  1 


/>27r  /> 

=  /  /  Eg(s)Tlfo(s)-EZ(s)Tl’rLo(s)d6d<l> 

Jo  Jo 

where  we  have  denoted  the  aliased  high  frequencies  of 
elKr  s  as  s), 


2?lP0( s)  d6d$ 


Transforming  to  Fourier  space  and  applying  the  tri¬ 
angle  inequality  to  prevent  unpredictable  cancelation 
effects, 


Finally,  because  sampling  the  transfer  function  at 
a  single  point  is  an  0(£)  operation,  the  algorithm  as 
presented  is  0(£3).  The  computation  of  the  trans¬ 
fer  function  at  all  sample  points  can  be  accelerated 
to  0(£2)  as  in  (Ergul  &  Gurel,  2006)  by  taking  ad¬ 
vantage  of  its  symmetry  about  the  ?o  axis  and  using 
interpolation  algorithms,  but  at  the  cost  of  introducing 
additional  error. 


3.2  Choice  of  Quadrature 


The  quadrature  parameters  can  be  constructively 
determined  by  deriving  the  maximum  error  they  in¬ 
cur.  The  error  in  computing  the  integral  with  uniform 
quadrature  is 


\£i\  = 


n‘2,K  /*27T 

/  /  e*'w-2?iP0( s)  d0d0 

Jo  Jo 

iK>sn  )m’rrrs’L  /  \ 

1£,r0  \~n,m) 


'0  JO 

N 0  Ne 

-EE  ^ 

m—l  n—1 


- 471-2  E  E  E(n’m)  Tlr0(-n,-m) 


n=—t  m—— oo 
?H 


E(n ,  m)  =  E*r(n,  m)  -  E^r(n,  m) 


This  remains  an  accurate  upper  bound  due  to  the  fast 
decay  of  E  for  sufficiently  large  values  of  Ng  and  N^. 
See  Fig.  1.  Furthermore,  this  does  not  use  phase  in¬ 
formation  in  computing  the  error,  which  accounts  for 
more  directional  choices  of  ro  and  r  by  taking  advan¬ 
tage  of  the  shift  theorem. 


3.2.1  Choosing  N $ 


The  worst  case  for  £/  in  terms  of  occurs  when 
r  and  ro  are  aligned  with  the  z-axis.  This  causes  all 
spectral  information  to  be  contained  in  the  0-direction 
and  makes  £/  a  function  of  only.  It  leads  to 


£/|<47t2  J2  E*(0,m)-E*(p,m)  T/ro(0,-m) 
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Frequency 


Fig.  1:  The  value  of 


E?t(0,m)  -  E^r(0,m) 


0.8^  •  100  and  N#  =  318. 


for  k  |r I  = 


the  xy-plane.  Without  loss  of  generality,  suppose  ?  = 
x.  Then  considering  a  constant  <j>  =  tfrj  and  using  the 
J  acobi- Anger  identity,  the  plane  wave  can  be  expressed 
as 


elK,TS  _  inJn{K  |r|  sin(^))einfl 

n—— oo 

and  noting  that  Jn(nsin((f>j)  |r|)  is  exponentially  small 
when 


and  applying  the  Jacobi- Anger  identity, 

OO 

eifts-r  _  Y  inJn(n  \r\)ein,p 

n=— oo 

where  cos(yj)  =  r  •  s,  this  simplifies  this  to 

OO 

M=47T2  |jM(Jv^m)(K|r|)|  f/jro(0,m)  (10) 


m—— oo 


n  ~  0(k  |r|  sin(^j)) 

implying  that  we  can  truncate  the  series  at  Ng(<j>j)  ~ 
0(k  |r|  sin(^j))  without  incurring  any  appreciable  er¬ 
ror.  Estimates  of  Ng(<f>j)  can  also  be  developed 
by  determining  when  Jn(n  |r|  sin(0J-))  is  exponentially 
small,  as  in  the  computation  of  the  excess  bandwidth 
formula  (EBF)  in  (Chew  et  ai,  2001).  However,  we 
find  the  EBF  generated  quadrature  typically  has  an 
overestimated  sampling  rate. 


where 


N<f,  —  \m\  \m\  <  N^/2  —  1 

\m\  \m\  >  N<j,/2  -  1 


(11) 


which  can  be  used  to  efficiently  search  for  a  value 
via  the  following  algorithm  sketched  below. 


First,  it  must  be  noted  that  letting  Ng  be  a  func¬ 
tion  of  <pj  requires  an  additional  step  in  the  computa¬ 
tion  of  the  modified  transfer  function.  Sec.  3.1  com¬ 
puted  the  transfer  function  on  a  A^/2  +  1  x  Ng  grid. 
With  Ng  —>■  Ng{<t>j ),  the  data  computed  for  each  cj)j 
must  be  Fourier  anterpolated  from  length  Ng  to  length 

Ng(<t>j)- 


Choose  N£  sufficiently  larger  than  2^  +  1. 
Tk^Tllrorz(0,^),k  =  0,...,N%-l. 

Tm  <—  absolute  value  of  Fourier  series  of  T \ . 
Em.  \Jm(K  I  r  | )  | . 

For  N<j,  from  2£  to  by  2 
Em  <—  EM(N</)>m). 

If  E*  ■  T  <  e/Att2  ,  return  N^. 


Since  N"£  is  typically  only  a  small  constant  larger  than 
2£  +  1,  the  algorithm  as  presented  is  dominated  by 
the  computation  of  the  0(£)  modified  transfer  function 
values  and  requires  0(£2)  operations.  Important  op¬ 
timizations  include  noting  the  symmetry  E^  =  E*_m 
and  Tm  =  T_m  and  also  taking  advantage  of  the  very 
fast  decay  of  J„  to  accelerate  the  inner  product  by  not 
including  terms  which  cannot  contribute.  More  ad¬ 
vanced  searching  methods  also  provide  improved  per¬ 
formance. 


3.2.2  Choosing  Ng 

After  determining  an  appropriate  N^,  a  significant 
optimization  is  made  by  allowing  Ng  to  be  a  function  of 
</>.  This  significantly  reduces  the  number  of  quadrature 
points  at  negligible  cost  to  the  error.  The  worst  case 
for  the  integration  error  occurs  when  r  and  ro  are  in 


To  accurately  and  reliably  compute  Ng{<j>j)  the 
same  procedure  as  in  Sec.  3.2.1  is  applied  but  with 
r  and  ro  in  the  sy-plane.  This  represents  the  worst 
case  for  the  integration  error  as  a  function  of  Ng .  For 
a  given  <f>j,  we  search  for  a  Ng(<pj)  such  that 


M  <  2tt  ^  \JM(Ng(<f>j),n)(K  I  r  |  sin(</>j))|  T|ro(n;  fa) 

n——l 

(12) 


is  bounded  by  a  prescribed  error.  This  is  accomplished 
via  the  following  sketched  algorithm. 


Set  the  poles  Ng(<j)0)  =  Ng{^N^/2)  =  1. 

Choose  Ng  sufficiently  larger  than  2^  +  1. 

For  (j)j,  j  =  1,. .. ,  NJ2  -  1 

Tx<-Tllrol~(^,h),k  =  0,...,2l 
Tn  <—  absolute  value  of  Fourier  series  of  T \ . 

En  <-  |J„(/c|r|sin(^))|. 

For  Ng  ( (f)j )  from  2  to  Ng  by  2 

En  E M(Ne n)- 

if  E*  -  T  <  e/ 2tt,  save  Ng(<j>j)  and  j  <—  j  +  1. 
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Since  Ng  is  only  a  small  constant  larger  than  2t  +  1, 
the  algorithm  as  presented  is  dominated  by  the  com¬ 
putation  of  the  modified  transfer  function  and  requires 


0(£3)  operations.  Again,  using  more  advanced  search¬ 
ing  methods  accelerates  this  algorithm  and  symmetry 
and  decay  properties  of  E*  and  T  should  be  use  to  im¬ 
prove  the  inner  product  computation.  Using  the  EBF 
as  an  initial  guess  in  the  search  for  Ng(</>j)  further  im¬ 
proves  the  searching  speed.  Additionally,  only  half  of 
the  Ng(<f>j)’s  should  be  computed  due  to  symmetry. 

3.2.3  Choosing  |r|  and  |ro| 

The  previous  algorithms  require  representative 
values  of  |r|  and  |ro|  for  each  level  of  the  tree.  The 
worst-case  transfer  vectors,  ro,  are  well  known  to  be 
those  with  minimum  magnitude.  If  a/  is  the  box  size 
at  level  l,  then  |r0|  =  2a;  is  the  smallest  transfer  vector 
length  in  the  common  one  buffer  box  case. 

The  worst  case  values  of  |r|  is  well  known  to  be 
the  largest.  For  a  box  of  size  a;,  |r|  <  aiV 3.  However, 
using  |r|  =  ai\J 3  in  the  previous  methods  is  a  very 
conservative  choice.  This  case  only  occurs  when  two 
points  are  located  in  the  exact  corners  of  the  boxes  - 
rare  indeed.  See  Fig.  2.  Instead,  we  let  |r|  =  aa;V 3 
for  some  a  £  [0, 1].  A  high  a  will  strongly  guarantee 
an  upper  bound  on  the  error  generated  by  the  quadra¬ 
ture,  but  the  points  which  actually  generate  this  error 
become  more  and  more  rare.  A  lower  value  of  a  will 
yield  a  smaller  quadrature,  but  more  points  may  fall 
outside  the  radius  |r|  where  the  upper  bound  on  the 
error  is  guaranteed.  A  typically  reliable  choice  appears 
to  be  a  =  0.8.  A  study  of  this  parameter  will  appear 
in  a  subsequent  paper. 


1 

\\ 

4\ 

i 

Fig.  2:  The  worst  case  r  and  ro,  projected  from  the 
3D  box.  Here,  jr0|  =  2a;  and  i  and  j  on  at  opposite 
corners  of  the  box  so  that  |r|  =  |ric|  +  |rCJj  =  a;V 3. 

3.2.4  Number  of  Quadrature  Points 

Recall  from  Sec.  2.1  that  the  typical  approach  in 
the  standard  MLFMM  is  to  use  N+l  uniform  points  in 
the  9  direction  and  A±I  Gauss-Legendre  points  in  the 
</>  direction  so  that  all  Y™,  —n  <  m  <  n,  0  <  n  <  N 
are  integrated  exactly.  In  (Koc  et  ai,  1999),  Chew  et. 
al.  takes  A±I  =  £  +  1,  which  accounts  for  the  rapid 
decay  of  the  spherical  harmonics  in  the  plane  wave 
expansion.  This  results  in 

Mg  =  2(1  +  l)2  ft  2£2 


quadrature  points. 

For  a  given  Gegenbauer  series  truncation  £,  the 
total  number  of  active  quadrature  points  required  in 
the  Fourier  based  MLFMM  is  approximately 

AT  1  P  7T  O 

W)  #«(£  +  <?)-( 2^  +  1) 

2  7T  Jg  7T 

where  C  >  1  is  a  small  integer  dependent  on  £  nu¬ 
merically  computed  from  the  method  in  Sec.  3.2.1. 
Keeping  only  the  dominant  term, 

Mr  ft  -£2 

7T 

Thus,  the  method  presented  in  this  paper  uses  approx¬ 
imately  0.6  times  the  number  of  quadrature  points  in 
the  standard  MLFMM.  However,  it  may  be  possible 
that  the  same  Ng  optimization  can  be  applied  to  the 
standard  MLFMM  for  the  same  reasons  it  was  applied 
in  Sec.  3.2.2  to  reduce  the  standard  quadrature  to  a 
comparable  size. 

3.3  Interpolation  and  Anterpolation 

Most  importantly,  the  Fourier  based  MLFMM  di¬ 
rectly  uses  FFTs  in  the  interpolation  and  anterpola¬ 
tion  steps.  This  makes  the  time  critical  upward  pass 
and  downward  pass  especially  fast  and  easy  to  imple¬ 
ment  . 

Characterize  a  quadrature  by  an  array  of  length 

JV  2  +  1, 

Q  =  [i-,Ng(<j>i),  .  .  .  ,  Ng((j)N<l>/2-i),  1] 

noting  that  Ng(<f>j )  =  Ng(<j>N*/2+j)  and  Ng(<pj )  = 
N$ (<j) N(jJ/2—j')‘  A  transform  of  the  data  F(9i,4>j )  sam¬ 
pled  on  a  quadrature  Q  to  a  quadrature  Q’  is  per¬ 
formed  by  a  sequence  of  Fourier  interpolations  and 
anterpolations.  Let 

Mg  =  max(  max  Ng(d)j),  max  Ng(d> A  ) 
0<j<N*/2  0<j<Jv;/2 

Then,  the  following  steps,  as  illustrated  in  Fig.  3,  per¬ 
form  an  exact  interpolation/anterpolation  using  only 
FFTs. 

1.  For  each  <f>j,  0  <  j  <  N^/ 2,  Fourier  interpo¬ 
late  the  data  [F(0i=o 4>j)]  from  length 
Ng^j)  to  Mg. 

2.  For  each  di,  0  <  i  <  Mg/2,  wrap 

the  data  to  construct  the  periodic 

sequence  from  the  rest  of  the  line 

[F(9i,  <t>j=0,...,N4,/2),F(Qi+Me/2i  <t>j=N4,l  2— 
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rQ  ~  x,  eps  =  10' 


3.  For  each  6fy  0  <  i  <  Me/2,  Fourier  interpolate  the 
data  [F(0j,0j=  :o,...,jv*-i)]  from  length  N<f,  to  N'^. 

4.  For  each  9i,  0  <  i  <  A/e/2,  unwrap 

the  data  \F(0i,<t>j= o,...,at^_i)]  to  construct 
the  sequences  4>j=o,...,N,cl>/2)\  and 

[F{0%+m9/ 2,  <^»j=o . JV^/2)]- 

5.  For  each  <f>j,  0  <  j  <  N'^/2,  Fourier  anterpolate 

the  data  0j)]  from  length  A fg 

tO  N'gtyj). 


Fig.  3:  The  data  profile  at  each  step  in  an  anterpo- 
lation  from  a  large  quadrature  Q  with  =  30  to  a 
smaller  quadrature  Q'  with  =  24.  The  data  corre¬ 
sponding  to  a  pole  has  been  darkened  for  clarity. 

4.  NUMERICAL  RESULTS 


A  direct  computation  was  used  to  compute  the 
optimal  Gegenbauer  truncation  £  and  the  methods  de¬ 
scribed  Sec.  3.2  were  used  to  construct  a  quadrature. 
For  a  given  box  size  s,  the  quadrature  and  truncation 
are  constructed  with  |r|  =  O.Ssa/3,  |r0|  =  2s,  and  tar¬ 
get  error  eps.  The  total  measured  error,  e,  is 


eiK|r+r0| 

lr  +  ro| 


N<f>  Ne(<f>m) 

^  ^  ^  ^  ^n,mC 


m— 1  n=  1 


The  total  Gegenbauer  truncation  error,  sq,  is 


|r+r0| 

eG  =  — ; — 7-  -  IK  V'(-l)"(2n  +  l)Gn(r,  r0) 

lr  +  r°l  n=0 

The  total  integration  error,  ej,  is 


£1  =  e  -  eg 


The  plotted  errors  in  Fig.  4  represent  the  maximum 
error  found  over  many  directions  r  and  magnitudes 
| r |  <  0.8s-\/3,  where  s  is  the  box  size. 

Figure  5  shows  the  recorded  running  times  of  the 
Fourier  based  MLFMM  and  the  direct  matrix-vector 
product  on  a  Intel  Core(2)  Quad  CPU  Q9450  2.66GHz 
with  4GB  of  RAM.  Note  that  the  intersection  point  is 
less  than  N  =  10, 000. 


Fig.  4:  Error  of  the  FMM  integral  using  a  direct  com¬ 
putation  of  £  and  quadrature  chosen  as  in  Sec.  3.2. 


5.  CONCLUSIONS 

We  have  proposed  using  a  Fourier  basis  in  the 
spherical  variables  9  and  (j>:  elp9ebq 'A  By  modifying 
the  Helmholtz  kernel  approximation  and  using  a  uni¬ 
form  quadrature  we  can  take  advantage  of  very  fast, 
exact,  and  well-known  FFT  interpolation  and  anterpo- 
lation  methods.  By  exploiting  symmetries,  the  num¬ 
ber  of  uniform  quadrature  points  required  is  almost 
identical  to  the  number  of  Gauss-Legendre  quadrature 
points  typically  used  for  spherical  quadratures  and  can 
be  improved  to  use  even  fewer  points.  However,  the 
Fourier  based  MLFMM  requires  careful  changes  to 
the  precomputation  stage  where  the  modified  trans¬ 
fer  functions,  T/  ( s),  are  computed.  Since  |sin(</>)|  is 
not  smooth,  we  must  accurately  precompute  a  band- 
limited  version  of  the  modified  transfer  function. 

The  Fourier  based  MLFMM  approach  has  a  num¬ 
ber  of  advantages.  Since  the  interpolation  and  anter- 
polation  algorithms  are  exact,  significant  errors  in  the 
algorithm  are  a  function  of  only  the  truncation  pa¬ 
rameter  £  and  the  quadrature,  specifically  the  band¬ 
width  in  the  (/-direction.  The  truncation  error  e ?  has 
been  extensively  studied  and  is  well  understood.  The 
integration  error  £j  can  be  accounted  for  during  the 
precomputation  stage  and  precise  bounds  on  the  fi¬ 
nal  error  can  be  made.  The  error  analysis  relies  on 
well-known  properties  of  the  Fourier  basis.  We  have 
used  the  results  in  constructive  algorithms  to  deter¬ 
mine  the  appropriate  quadrature  and  constrain  the 
MLFMM  error  a-priori.  We  find  that  bounding  inte¬ 
gration  error  is  accurate  and  efficiently  exploitable  to 
search  for  the  optimal  quadrature.  Although  not  re¬ 
quired  in  the  Fourier  based  MLFMM,  computing  the 
optimal  quadrature  can  aid  in  error  analysis  and  im- 
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Fig.  5:  Running  times  of  the  Fourier  Based  MLFMM 
with  k  ~  iV1/3  such  that  for  TV  =  4  x  106  the  points 
are  uniformly  distributed  in  a  cube  with  side  length 
64A. 

prove  performance.  Since  the  Fourier  Based  MLFMM 
uses  a  uniform  quadrature  and  well-known  FFT  algo¬ 
rithms,  the  time-critical  stages  of  the  algorithm  are 
much  easier  to  implement.  Modern  FFT  packages  are 
very  common,  extremely  fast,  numerically  exact,  and 
easy  to  use.  FFTs  scale  much  better  than  other  fast  in¬ 
terpolation  algorithms  of  constant  accuracy.  Although 
the  asymptotic  complexity  is  not  improved,  the  smaller 
constant  in  the  FFT’s  0(N  log  N)  complexity  yields 
improved  running  times  in  relation  to  algorithms  of 
comparable  accuracy. 
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